Let’s check out Self-Efficacy models.
Let’s see the influence that the difference in majors have an the
average of Self-Efficacy questions:
##
## Call:
## glm(formula = self_efficacy_ave ~ major, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2592 -0.3766 0.2234 0.7408 1.2375
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.25915 0.11797 36.104 < 2e-16 ***
## majorChemistry -0.31630 0.39379 -0.803 0.422482
## majorComputer_sci 0.14085 0.39379 0.358 0.720844
## majorEarth_sci 0.74085 1.00100 0.740 0.459809
## majorEngineering 0.02085 0.28247 0.074 0.941222
## majorHealth_sci -0.48253 0.14259 -3.384 0.000808 ***
## majorMathematics -0.15915 0.71271 -0.223 0.823446
## majorNon_STEM -0.34611 0.23849 -1.451 0.147739
## majorOther -0.49665 0.21165 -2.347 0.019588 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.9880846)
##
## Null deviance: 316.14 on 311 degrees of freedom
## Residual deviance: 299.39 on 303 degrees of freedom
## AIC: 892.55
##
## Number of Fisher Scoring iterations: 2
This shows that those who study Health Science and Other majors
answer significantly lower Self-Efficacy than the rest of the groups of
majors.
This is great, but it is difficult to understand this in context
until we compare it to the other models.
Now, let’s look at how career goals affect the average of
Self-Efficacy questions:
##
## Call:
## glm(formula = self_efficacy_ave ~ career, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.1028 -0.3426 0.2500 0.6972 1.2574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1022222 0.1491172 27.510 <2e-16 ***
## careereducator -0.2868376 0.3149707 -0.911 0.3632
## careerengineer 0.0844444 0.2982344 0.283 0.7773
## careerhealth_care_pro -0.3595993 0.1744641 -2.061 0.0401 *
## careermedical_doctor 0.0005556 0.1900879 0.003 0.9977
## careernon_stem -0.3522222 0.2688249 -1.310 0.1911
## careerresearcher 0.4120635 0.4064250 1.014 0.3115
## careerscientist -0.0355556 0.3249934 -0.109 0.9130
## careertechnician 0.1977778 0.4347476 0.455 0.6495
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.000617)
##
## Null deviance: 316.14 on 311 degrees of freedom
## Residual deviance: 303.19 on 303 degrees of freedom
## AIC: 896.48
##
## Number of Fisher Scoring iterations: 2
This shows that those pursuing Healthcare Professional careers
have lower Self-Efficacy than the rest of the career goals.
Now let’s run from another model’s point of view.
How about we look at how different ethnicities affect Self-Efficacy?
##
## Call:
## glm(formula = self_efficacy_ave ~ ethnicity, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0077 -0.4077 0.1556 0.7923 2.6000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.400e+00 4.372e-01 5.489 8.56e-08 ***
## ethnicityamerican_indian 1.400e+00 6.558e-01 2.135 0.033589 *
## ethnicityarab 2.600e+00 1.071e+00 2.428 0.015778 *
## ethnicityasian 1.644e+00 5.453e-01 3.016 0.002782 **
## ethnicitybiracial 1.844e+00 5.453e-01 3.382 0.000813 ***
## ethnicitylatinx 1.318e+00 4.974e-01 2.649 0.008493 **
## ethnicityother -2.621e-14 1.071e+00 0.000 1.000000
## ethnicitypacific_islander 1.333e-01 7.140e-01 0.187 0.851983
## ethnicityprefer_not_answer 6.500e-01 6.558e-01 0.991 0.322422
## ethnicitywhite 1.608e+00 4.414e-01 3.642 0.000318 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.9557959)
##
## Null deviance: 316.14 on 311 degrees of freedom
## Residual deviance: 288.65 on 302 degrees of freedom
## AIC: 883.15
##
## Number of Fisher Scoring iterations: 2
This shows that American Indians, Arabs, Asians, those who are
Biracial, Latinx, and Caucasians have significantly higher
Self-Efficacy.
One last model. Let’s check it out.
Here is the model according to a presence of medical conditions or
not.
##
## Call:
## glm(formula = self_efficacy_ave ~ med_condition, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.93846 -0.33846 0.06154 0.82564 1.06154
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.93846 0.06112 64.443 <2e-16 ***
## med_conditionYes 0.03590 0.17286 0.208 0.836
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.019679)
##
## Null deviance: 316.14 on 311 degrees of freedom
## Residual deviance: 316.10 on 310 degrees of freedom
## AIC: 895.49
##
## Number of Fisher Scoring iterations: 2
This shows that Medical Conditions does not have an influence on Self-Efficacy.
Now that we have those 4 models according to the demographic that we
are interested in, let’s compare them against each other to see which
models were best in determining Self-Efficacy.
## Parameter | se_mod_1 | se_mod_2 | se_mod_3 | se_mod_4
## -----------------------------------------------------------------------------------------------------
## (Intercept) | 4.26*** (0.12) | 4.10*** (0.15) | 2.40*** (0.44) | 3.94*** (0.06)
## major (Computer_sci) | 0.14 (0.39) | | |
## major (Earth_sci) | 0.74 (1.00) | | |
## major (Engineering) | 0.02 (0.28) | | |
## major (Chemistry) | -0.32 (0.39) | | |
## major (Mathematics) | -0.16 (0.71) | | |
## major (Non_STEM) | -0.35 (0.24) | | |
## major (Other) | -0.50* (0.21) | | |
## major (Health_sci) | -0.48*** (0.14) | | |
## career (engineer) | | 0.08 (0.30) | |
## career (health_care_pro) | | -0.36* (0.17) | |
## career (medical_doctor) | | 5.56e-04 (0.19) | |
## career (educator) | | -0.29 (0.31) | |
## career (researcher) | | 0.41 (0.41) | |
## career (scientist) | | -0.04 (0.32) | |
## career (technician) | | 0.20 (0.43) | |
## career (non_stem) | | -0.35 (0.27) | |
## ethnicity (asian) | | | 1.64** (0.55) |
## ethnicity (biracial) | | | 1.84*** (0.55) |
## ethnicity (american_indian) | | | 1.40* (0.66) |
## ethnicity (arab) | | | 2.60* (1.07) |
## ethnicity (pacific_islander) | | | 0.13 (0.71) |
## ethnicity (prefer_not_answer) | | | 0.65 (0.66) |
## ethnicity (latinx) | | | 1.32** (0.50) |
## ethnicity (other) | | | -2.62e-14 (1.07) |
## ethnicity (white) | | | 1.61*** (0.44) |
## med condition (Yes) | | | | 0.04 (0.17)
## -----------------------------------------------------------------------------------------------------
## Observations | 312 | 312 | 312 | 312
This shows everything that we have performed thus far, but now
let’s add some statistical analysis to it.
## # Comparison of Model Performance Indices
##
## Name | Model | AIC | AIC weights | BIC | BIC weights | R2 | RMSE | Sigma
## --------------------------------------------------------------------------------------------
## se_mod_1 | glm | 892.545 | 0.009 | 929.975 | 8.92e-06 | 0.053 | 0.980 | 0.994
## se_mod_2 | glm | 896.478 | 0.001 | 933.908 | 1.25e-06 | 0.041 | 0.986 | 1.000
## se_mod_3 | glm | 883.148 | 0.988 | 924.321 | 1.51e-04 | 0.087 | 0.962 | 0.978
## se_mod_4 | glm | 895.491 | 0.002 | 906.720 | 1.000 | 1.391e-04 | 1.007 | 1.010
To put all of this in English, we are most interested in those
that have higher R^2 values and lower BIC, AIC, and RMSE values.
According to that knowledge, it looks like all models may be significant
in predicting Self-Efficacy. Let’s see these together, and with that, we
can see whether how they relate.
##
## Call:
## glm(formula = self_efficacy_ave ~ major + ethnicity + career +
## med_condition, data = df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.1749 -0.3749 0.2055 0.6182 2.7232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.917731 0.479901 6.080 3.85e-09 ***
## majorChemistry -0.502638 0.408743 -1.230 0.219818
## majorComputer_sci 0.200983 0.478514 0.420 0.674791
## majorEarth_sci 0.532371 1.055595 0.504 0.614417
## majorEngineering 0.375505 0.577113 0.651 0.515789
## majorHealth_sci -0.417848 0.159684 -2.617 0.009352 **
## majorMathematics -0.182061 0.698871 -0.261 0.794661
## majorNon_STEM 0.062449 0.408616 0.153 0.878640
## majorOther -0.405238 0.234018 -1.732 0.084418 .
## ethnicityamerican_indian 1.471611 0.658626 2.234 0.026234 *
## ethnicityarab 2.119737 1.072151 1.977 0.048995 *
## ethnicityasian 1.632796 0.553118 2.952 0.003420 **
## ethnicitybiracial 1.642634 0.556800 2.950 0.003440 **
## ethnicitylatinx 1.209543 0.498482 2.426 0.015868 *
## ethnicityother -0.480263 1.072151 -0.448 0.654534
## ethnicitypacific_islander 0.160269 0.713446 0.225 0.822420
## ethnicityprefer_not_answer 0.445674 0.655239 0.680 0.496949
## ethnicitywhite 1.494614 0.442234 3.380 0.000827 ***
## careereducator -0.255685 0.338821 -0.755 0.451094
## careerengineer -0.588764 0.585327 -1.006 0.315330
## careerhealth_care_pro -0.223101 0.175434 -1.272 0.204512
## careermedical_doctor -0.037468 0.193706 -0.193 0.846764
## careernon_stem -0.699786 0.431068 -1.623 0.105614
## careerresearcher 0.055284 0.446979 0.124 0.901653
## careerscientist -0.266564 0.333683 -0.799 0.425040
## careertechnician -0.003913 0.480447 -0.008 0.993508
## med_conditionYes 0.103215 0.174851 0.590 0.555456
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.939999)
##
## Null deviance: 316.14 on 311 degrees of freedom
## Residual deviance: 267.90 on 285 degrees of freedom
## AIC: 893.87
##
## Number of Fisher Scoring iterations: 2
Well awesome, this gives us a great place that we can start in
further data analysis in our project.